69 research outputs found

    Object Detection Through Exploration With A Foveated Visual Field

    Get PDF
    We present a foveated object detector (FOD) as a biologically-inspired alternative to the sliding window (SW) approach which is the dominant method of search in computer vision object detection. Similar to the human visual system, the FOD has higher resolution at the fovea and lower resolution at the visual periphery. Consequently, more computational resources are allocated at the fovea and relatively fewer at the periphery. The FOD processes the entire scene, uses retino-specific object detection classifiers to guide eye movements, aligns its fovea with regions of interest in the input image and integrates observations across multiple fixations. Our approach combines modern object detectors from computer vision with a recent model of peripheral pooling regions found at the V1 layer of the human visual system. We assessed various eye movement strategies on the PASCAL VOC 2007 dataset and show that the FOD performs on par with the SW detector while bringing significant computational cost savings.Comment: An extended version of this manuscript was published in PLOS Computational Biology (October 2017) at https://doi.org/10.1371/journal.pcbi.100574

    Evolution and Optimality of Similar Neural Mechanisms for Perception and Action during Search

    Get PDF
    A prevailing theory proposes that the brain's two visual pathways, the ventral and dorsal, lead to differing visual processing and world representations for conscious perception than those for action. Others have claimed that perception and action share much of their visual processing. But which of these two neural architectures is favored by evolution? Successful visual search is life-critical and here we investigate the evolution and optimality of neural mechanisms mediating perception and eye movement actions for visual search in natural images. We implement an approximation to the ideal Bayesian searcher with two separate processing streams, one controlling the eye movements and the other stream determining the perceptual search decisions. We virtually evolved the neural mechanisms of the searchers' two separate pathways built from linear combinations of primary visual cortex receptive fields (V1) by making the simulated individuals' probability of survival depend on the perceptual accuracy finding targets in cluttered backgrounds. We find that for a variety of targets, backgrounds, and dependence of target detectability on retinal eccentricity, the mechanisms of the searchers' two processing streams converge to similar representations showing that mismatches in the mechanisms for perception and eye movements lead to suboptimal search. Three exceptions which resulted in partial or no convergence were a case of an organism for which the targets are equally detectable across the retina, an organism with sufficient time to foveate all possible target locations, and a strict two-pathway model with no interconnections and differential pre-filtering based on parvocellular and magnocellular lateral geniculate cell properties. Thus, similar neural mechanisms for perception and eye movement actions during search are optimal and should be expected from the effects of natural selection on an organism with limited time to search for food that is not equi-detectable across its retina and interconnected perception and action neural pathways

    Human Visual Search Does Not Maximize the Post-Saccadic Probability of Identifying Targets

    Get PDF
    Researchers have conjectured that eye movements during visual search are selected to minimize the number of saccades. The optimal Bayesian eye movement strategy minimizing saccades does not simply direct the eye to whichever location is judged most likely to contain the target but makes use of the entire retina as an information gathering device during each fixation. Here we show that human observers do not minimize the expected number of saccades in planning saccades in a simple visual search task composed of three tokens. In this task, the optimal eye movement strategy varied, depending on the spacing between tokens (in the first experiment) or the size of tokens (in the second experiment), and changed abruptly once the separation or size surpassed a critical value. None of our observers changed strategy as a function of separation or size. Human performance fell far short of ideal, both qualitatively and quantitatively

    Reinforcement learning or active inference?

    Get PDF
    This paper questions the need for reinforcement learning or control theory when optimising behaviour. We show that it is fairly simple to teach an agent complicated and adaptive behaviours using a free-energy formulation of perception. In this formulation, agents adjust their internal states and sampling of the environment to minimize their free-energy. Such agents learn causal structure in the environment and sample it in an adaptive and self-supervised fashion. This results in behavioural policies that reproduce those optimised by reinforcement learning and dynamic programming. Critically, we do not need to invoke the notion of reward, value or utility. We illustrate these points by solving a benchmark problem in dynamic programming; namely the mountain-car problem, using active perception or inference under the free-energy principle. The ensuing proof-of-concept may be important because the free-energy formulation furnishes a unified account of both action and perception and may speak to a reappraisal of the role of dopamine in the brain

    Combining eye and hand in search is suboptimal

    Get PDF
    When performing everyday tasks, we often move our eyes and hand together: we look where we are reaching in order to better guide the hand. This coordinated pattern with the eye leading the hand is presumably optimal behaviour. But eyes and hands can move to different locations if they are involved in different tasks. To find out whether this leads to optimal performance, we studied the combination of visual and haptic search. We asked ten participants to perform a combined visual and haptic search for a target that was present in both modalities and compared their search times to those on visual only and haptic only search tasks. Without distractors, search times were faster for visual search than for haptic search. With many visual distractors, search times were longer for visual than for haptic search. For the combined search, performance was poorer than the optimal strategy whereby each modality searched a different part of the display. The results are consistent with several alternative accounts, for instance with vision and touch searching independently at the same time

    Keeping an eye on noisy movements: On different approaches to perceptual-motor skill research and training

    Get PDF
    Contemporary theorising on the complementary nature of perception and action in expert performance has led to the emergence of different emphases in studying movement coordination and gaze behaviour. On the one hand, coordination research has examined the role that variability plays in movement control, evidencing that variability facilitates individualised adaptations during both learning and performance. On the other hand, and at odds with this principle, the majority of gaze behaviour studies have tended to average data over participants and trials, proposing the importance of universal 'optimal' gaze patterns in a given task, for all performers, irrespective of stage of learning. In this article, new lines of inquiry are considered with the aim of reconciling these two distinct approaches. The role that inter- and intra-individual variability may play in gaze behaviours is considered, before suggesting directions for future research

    Does oculomotor inhibition of return influence fixation probability during scene search?

    Get PDF
    Oculomotor inhibition of return (IOR) is believed to facilitate scene scanning by decreasing the probability that gaze will return to a previously fixated location. This “foraging” hypothesis was tested during scene search and in response to sudden-onset probes at the immediately previous (one-back) fixation location. The latencies of saccades landing within 1º of the previous fixation location were elevated, consistent with oculomotor IOR. However, there was no decrease in the likelihood that the previous location would be fixated relative to distance-matched controls or an a priori baseline. Saccades exhibit an overall forward bias, but this is due to a general bias to move in the same direction and for the same distance as the last saccade (saccadic momentum) rather than to a spatially specific tendency to avoid previously fixated locations. We find no evidence that oculomotor IOR has a significant impact on return probability during scene search

    Principles of sensorimotor learning.

    Get PDF
    The exploits of Martina Navratilova and Roger Federer represent the pinnacle of motor learning. However, when considering the range and complexity of the processes that are involved in motor learning, even the mere mortals among us exhibit abilities that are impressive. We exercise these abilities when taking up new activities - whether it is snowboarding or ballroom dancing - but also engage in substantial motor learning on a daily basis as we adapt to changes in our environment, manipulate new objects and refine existing skills. Here we review recent research in human motor learning with an emphasis on the computational mechanisms that are involved

    Visual Performance Fields: Frames of Reference

    Get PDF
    Performance in most visual discrimination tasks is better along the horizontal than the vertical meridian (Horizontal-Vertical Anisotropy, HVA), and along the lower than the upper vertical meridian (Vertical Meridian Asymmetry, VMA), with intermediate performance at intercardinal locations. As these inhomogeneities are prevalent throughout visual tasks, it is important to understand the perceptual consequences of dissociating spatial reference frames. In all studies of performance fields so far, allocentric environmental references and egocentric observer reference frames were aligned. Here we quantified the effects of manipulating head-centric and retinotopic coordinates on the shape of visual performance fields. When observers viewed briefly presented radial arrays of Gabors and discriminated the tilt of a target relative to homogeneously oriented distractors, performance fields shifted with head tilt (Experiment 1), and fixation (Experiment 2). These results show that performance fields shift in-line with egocentric referents, corresponding to the retinal location of the stimulus
    corecore